Goethe University

Data Science and Marketing Analysis, Yelp Dataset

A - Preparation

Read Libraries

1 Check-in

1.1- Read & Normalize File

1.2- Create Every Unique Day As A List

1.3- Transpose for Daily #s by Business

1.4- Weekly Check-ins

1.5 Monthly Check-ins

1.6- Filtering Businesses At Least 1 Monthly Check-in

1.7- Total # of CH for Each Day

1.8- Create Check dates to Plot

1.9- Plot Checkin By WeekDay and Year

1.10- Create Check-in Frequency

2 Business

Read & Normalize

1.1 Add Chain Data

1.2 Filter only for 2015

1.2 Distance to Center

1.3 Add Income & Population

1.6 Attributes Fix for Parking & Ambiance & GoodForMeal

1.7 Wifi & Alcohol Fix

1.8 Categories List to Columns

1.9 Fix City Names

1.10 Additional Changes for Attributes

3 Photo

3.1 Photo Compliment

3.2 Photo Table

4 Review & Users

4.1 Review - Read and Filter for 2015 & NZ Businesses

4.1.2 Create b_id_NZ_df_3

4.2 Read Users

4.3 Add Gender for Users

Fix Genders to Categorize

4.4 Filter Users for 2015

Gender Graph

4.4.1 Add User's Number of Rev in 2015

4.4.2 Create NZ User List

4.5 Merge User and Review Files, for 2015

4.5.1 Create b_id_NZ_df_4

4.6 Sentiment Analysis on Reviews

4.7 Rev_Users AVG

4.9 Daily Review Numbers

5 Tips

5.1 Sentiment Analysis on Tips

5.2 Daily Tip Numbers

B - Descriptive Analysis

Read Files

Number of Businesses in Each City

Ch_freq for each Cities

Business Categories Plot

Check-in Freq by Categories

Check-in by Price Range

Stars Plot

Check-ins by Day & Month

WordClouds for Tips & Reviews

Sentiment Analysis Graphs for Tips & Reviews

AVG Sentiment by Check-in

Rev_AVG

Rev_Avg to Latex

Income and Check-freq by ZipCode

Income Distribution Plot

Reviews by Gender

HeatMap of Resturants Location

Income & Dist & Check Plots

Attributes Check

Null Count and Plot Attributes

Show Selected Attributes

Latex for Physical Attributes

Create Numerical Analysis DataFrame

C - Prediction

Fix Files before Prediction

Weather Data

Drop Businesses without Weather Data

Create b_id_NZ_df_5

Bus Pred

Frac Part

Create 30 Random Date, 25% of Businesses

Review

Tips

Check-in

Full Part

Create Full Data

Add Daily Review, Tips, Checkin

Drop Data Without Prcp

Corr Table for Whole Data

Scaling

D - ML

Read File & Libraries

Correlation HeatMap for All Data

Split Data

Results DF

Ml Models

1.KNN

2.Linear Regression

3.Naive Bayes

4.SVM

5.Neural Network

6.Random Forest

7.XG Boost

Feature Imp. Plot for RF & XG Boost

7.1 XGBRF

8.Decision Tree

9.Logistic Regression

10.Stacked

Parameter DF

Results DF

Plot ROC Graph

Latex

Gini and TDL

Gini Func

TDL